Image of Bike

MetroBike Austin: Vision for the Future

Authors: Christian Lee, Joe Niehaus, Jason Petri, Jordan Pflum

Abstract:

Over the last 5 years “micromobility” companies, defined as companies that operate small, lightweight vehicles such as scooters and bikes that can be rented for short periods of time, have proliferated. The City of Austin operates its own such company, MetroBike, which operates bike stations in dozens of locations around the city. Austin’s enjoyable climate and extensive array of activities makes the city’s shareable bikes popular among local commuters and tourists alike. However, Austin has a very diverse population, that is segregated in many ways to specific areas of town. Moreover, crime is more common in some of these areas than others. Our question is, do these factors, and others, affect the ridership numbers of MetroBike? Do certain stations require more bicycles than others?

In this project, our team seeks to quantify the relationship between crime and ridership for a given station and looks to build upon this by identifying other useful features which may impact the number of rides that origination from any one station. If we are able to identify said features, and those features have a statistically significant impact on ridership at a given station, we hope that our research could help MetroBike better service their stations and ensure that enough bikes are available at a given time. Not only does this lead to a better experience for patrons of the company, but it should also allow the company to operate more efficiently.

Problem & Importance

As stated, the problem we are addressing is the potentially inefficient management of MetroBike’s ridesharing stations around the city of Austin. This problem was identified through the inspection of several stations around the city where MetroBike’s shareable bicycles sat idle. As each bike is rented for a very short period of time, the use of said bikes represents revenue for the company and in effect the city. Idle bikes represent lost revenue.

But why do these bikes sit idle? Are there characteristics of the surrounding area that prevent commuters and tourists from using the bikes? Is it due to the crime in the area? How much does weather play a role in the rental of the bikes in that area? In other words, are the bikes used to get to work, in which case a cloudy, overcast day is unlikely to play a role in the popularity of a station, or are they used primarily for recreation? Is there a festival, a football game, or some other popular event occurring on that day and in that area?

Each of these questions represents a potential feature that can be added to our data and eventually modeled to understand and hopefully predict the frequency of rides originating at a specific MetroBike station. The importance of this is clear from the initial exploration of the problem. If we can understand the relationship between each of these features and MetroBike’s ridership, we can hopefully predict the ridership on a given day at a given station. If we can successfully do this, MetroBike can expend its resources more efficiently, stocking up certain stations that are likely to be busy, thereby pulling in more revenue than they would if they left bikes in unused locations. This is a common exercise for businesses who rely on the efficient expenditure of resources to facilitate their customers behavior, simultaneously saving money, while potentially bringing in more revenue.

GPS Map

Approach and Rationale

After joining the MetroBike datasets, we are looking at three possible pieces of intelligence we can derive from our data. Each of these insights will be found using a regression analysis of our data. To do this, we will join the Bike share, Weather and Crime tables on date.

From this master database, we will seek to run a regression with the goal of determining the most popular stations from which users are likely to initiate a bike share. The station from which a ride was initiated will be our dependent variable and the features such as weather and crime on a given day will be our independent variables.

Clearly, the number of features will need to be whittled down to reduce dimensionality and prevent overfitting. Our reduction of features will likely be done using a mixture of manual reduction of features (i.e. removing features we do not think will play a role in our analysis, or may be redundant across datasets), as well as with lasso feature selection. Prior to conducting the lasso feature selection, we will initialize weights using the sum of squares method. Once we have initialized our weights, we will then penalize the model’s number of features using lasso, hopefully to the point where certain features which have been deemed unimportant will be dropped from our model.

With our dataset in place, and our features narrowed down, we will seek to look at how certain changes in crime and weather impact the likelihood of a rider choosing to redeem a bike share at a certain location. Does an increase in crime in a certain location reduce the likelihood of a rider using bike share? Does the weather play an important role in that location’s popularity (i.e. are there riders who need rides in that location regardless of the weather or is it a location that is only utilized on good weather days)?

Finally, can our model predict ridership? If so, MetroBike will know when and how completely to replenish their stations with bicycles for the upcoming day's rides. It can also indicate areas that would be prime candidates for new stations given the amount of demand in those locations.

Data Collection

The datasets available for our problem are all specific to the City of Austin. We used four datasets total:

Crime Data

The largest set contains 18 years of public crime reports from the Austin Police Department. This is a public dataset, available on the data.texas.gov website. It is updated on a weekly basis and contains a record of incidents that the Austin Police Department responded to and filed reports on. Though this data is available from 2002 onward, we will be primarily reviewing data from 2013 through 2017, coinciding with the available data from our most limited dataset.

Violent Crimes in Austin (2013-2017)

Adjust the slider to pinpoint crimes in a certain month

Make this Notebook Trusted to load map: File -> Trust Notebook

Weather Data

We will be joining the Crime dataset with publicly available weather data, obtained from Weather Underground, and collected from the Austin KATT Station (located at Camp Mabry in West-Central Austin). This data primarily includes historical temperature, precipitation, humidity, and wind speed for the City of Austin and local suburbs.

The data needs some cleaning, as you’d expect. The file has several quirks we need to clean up:

  • Some precipitation values are listed as ‘T’, meaning trace. This is a very small amount of rain, but not enough to register a value in inches. We replaced these with a value of 0.01 inches, so the entire column is numeric.

  • The events column is a string with a concatenation of ‘Fog’, ‘Rain’, 'Snow', and ‘Thunderstorm’, depending on which conditions were present. Here we created three new boolean columns which are set to True if there was Fog, Rain, Snow or Thunderstorms on that day.

Bike Trip Datasets

Finally, we will be measuring our publicly available data against information on nearly 650,000 Bike Share Trips within the city, made available by the City of Austin for trips from 2013 through 2017. The original dataset is available from Google Public Data. Bike shares are a service offered by a handful of providers (the most prolific in Austin being Austin B-Cycle) and are becoming a popular alternative means of transportation.

This data includes information on bike trip start location, stop location, duration and type of bike share user. An additional dataset includes bike station location data by latitude and longitude, as well as location operating status. We joined these tables on the individual station ID in order to capture latitude and longitude data for each ride.

Hover over a station to see it's location:

Make this Notebook Trusted to load map: File -> Trust Notebook

Pre-Processing

Detrending

  • A major concern when working with time series data is controlling for non-stationary data. Because our goal is to accurately model the evolution of a time series (in the case bike ridership) with respect to other observable features, we need to remove any change in the mean of the data over time. Detrending bike ridership will allow us to more accurately identify subtrends in the time series and more accurately identify which factors influence ridership the greatest.

  • To detrend the data, we fit a linear regression to the data and recorded the constant and slope of the line as well as their respective t-statistics. Because the t-statistic for ridership growth was significantly different from 0 (t-stat = 6.392) with a value of 0.1824. This means that on average ridership increased per day by 0.1824 rides. Since we determined there was a significant trend in ridership we proceeded to detrend the data by subtracting out the cumulative trend from every data point. The results are shown below:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.032
Model:                            OLS   Adj. R-squared:                  0.031
Method:                 Least Squares   F-statistic:                     40.86
Date:                Fri, 11 Dec 2020   Prob (F-statistic):           2.31e-10
Time:                        22:01:57   Log-Likelihood:                -9206.4
No. Observations:                1257   AIC:                         1.842e+04
Df Residuals:                    1255   BIC:                         1.843e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        401.3402     20.703     19.386      0.000     360.725     441.956
x1             0.1824      0.029      6.392      0.000       0.126       0.238
==============================================================================
Omnibus:                      776.814   Durbin-Watson:                   0.480
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             7611.291
Skew:                           2.785   Prob(JB):                         0.00
Kurtosis:                      13.691   Cond. No.                     1.45e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.45e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Feature Engineering & Exploration

Weather
  • We lagged each of the following features back one day to avoid forward looking in our prediction. Additionally, we took rolling moving averages of each of the features in 1 day, 3 day, 1 week and 1 month windows.
    • Average Sea Level Pressure in Inches
    • Average Wind Pressure
    • Average Visibility in Miles
    • Average Dew Point
    • Average Temperature in Farenheit
    • Sum of Precipitation in Inches
    • Notable Weather Events
  • This gave us a total of 28 weather features.
Events
  • ACL, SXSW, UT Home Footbal Games, Major US Holidays
    • Just as ridership varies by weekday/weekend, we expect these events to correspond to higher average rides
Crime
  • We want to quantity crimes affecting MetroBikes. We would assume the distance of the crime from a station would play a role in ridership to or from that station (ie a station with a lot of crime nearby would have lower ridership than an equivalent station without a lot of crime around it).
    • We varied the distance a crime had to be from a metro bike station in order to be considered by 50m, 100m, 250m, 500m.
  • Additionally, we assumed that as time passed, people would gradually begin to forget about the crime. Said a different way, people’s memory is finite and has a window of remembering. To account for this, we varied the window to count a crime occurring around a station, and summed all valid crimes that fit the criteria pairs laid out above for every day.
  • For example, on 01/01/2014, we quantify crime affecting metro bikes with the following features:

    • Sum crime within 50m, 100m, 250m, 500m of any given station in the past 1 day, 3 days, 1 week, 1 month.

    • The map below displays the stations with a 500m radius drawn them. Any crime occuring within this circle would be counted as effecting Metro Bikes for the relevent crime features.

Make this Notebook Trusted to load map: File -> Trust Notebook
Ridership
  • Using data on past ridership, we pulled in the average number of rides in the past 1 day, 3 days, 1 Week, and 1 Month, as well as the average duration of rides in the same periods.

Below are the frequency of rides per day.

Time
  • Month of the Year
    • The month helps to capture some seasonal weather patterns, as well as people's behaviors. For example, the average non-commuter is more likely to bike in the spring and summer months than the winter months.
  • Week of the Year
    • Although we have accounted for major events in Austin that would increase ridership, we will inevitably miss some. Because many events are seasonal and occur roughly the same time every year, mapping each week (0,1,2..51) may account for that. This may also account for recurring events in the university calendar, such as Spring, Summer and Winter Breaks.
  • Day of the Week
    • There is a clear increase in ridership on the weekends, as we could expect. Mapping each date to its weekday allows us to train the model to understand this imbalance.

ACL

Learning and Modeling

Feature Selection (LASSO And Ridge Regression)

Neural Network

Chosen models and why training methods

Examples - J&J to confirm

  • Base model: Before starting any of our machine learning models, we will establish a baseline. This gives us a benchmark of performance to compare our model against.

  • Time-based regression: We’ll use a linear model to predict the rentals based on the month of the year, day of the month, day of the week, and hour of day.

  • Daily weather and time-based regression: We’ll take the time-based model, and combine this with the daily weather conditions. We hypothesize that the number of rentals will change alongisde weather conditions.

  • Feature selection: Once we have all our features, we’ll verify which ones are most impactful to our model.

  • Hyperparameter tuning: We’ll look at both Ridge and Lasso models, which use regularization to help us generalize our model to new data.

Feature DF

Baseline Model (Full MLR Model)

The test root mean squared error is 213.02
The training root mean squared error is 166.01

Above, we see that there is some deviation from a perfect regression. On average, we are off by 213 rides per day in the testing set. In the training set, we can see that the error is much lower at 166 rides. This leads to the suspicion that we may have overfitted the training data. Later, attempts are made to create a more sparse model with fewer features

It is worth showing the impact on model output for the features at this baseline model stage. As fewer features are dropped out, the relevant features will change. Currently, the categorical feature dictating whether or not the ride is in March has tremendous model impact.

Feature Selection

LASSO
The best value of alpha is 1.0344827586206897
The RMSE score associated with the best alpha (for training) is 201.29316141404834
The RMSE score associated with the best alpha (for testing) is 186.89505776276698

Lasso regrssion resulted in a RMSE for training set of 201 and the testing set of 186. At an alpha of 1.0344 Lasso eliminated all but 8 of the features. These features, their feature type, and weight (ranked in importance) were ACL (Bool) [336.70], South by South West (Bool) [266.15], Week 11 (Bool) [210.73], Saturday (Bool) [194.49], Fiday (Bool) [48.00], March (Bool) [18.59], Average Rides 1-Day Lag (numeric) [0.59], crime 1-Day Lag 500m radius (numeric) [0.22].

Ridge Regression
The best value of alpha is 45.0
The RMSE score associated with the best alpha (for training) is 187.5055411932729
The RMSE score associated with the best alpha (for testing) is 197.9214544315465

Ridge regrssion resulted in a RMSE for training set of 187.50 and the testing set of 197.92. At an alpha of 45.0 Ridge ranked all the features in importance. The top 8 of these features, their feature type, and weight (ranked in importance) were Saturday (Bool) [199.06], Week 11 (Bool) [73.96], South by South West (Bool) [62.53], is March (Bool) [62.22], ACL (Bool) [47.88], whether a Weather Event Occured (Bool) [34.43], is Friday (Bool) [34.00], is week 40 (numeric) [26.00].

Neural Network

Using a neural network was attempted using all features, the top features from lasso regressions, and a wide variety of hyper parameters including batch sizes, number of epochs, learning rates, decay rates, quantity of layers, nodes on each layer, and including dropout layers. The iteration of models was conducted in a methodical manner. Models were tweaked into progressively better models through the process of changing the hyper parameters.

This image above shows the training and testing loss throughout epochs for the training of the models.

This image above shows the testing squared error throughout the epochs. There were a couple of model hyperparameters that resulted in lower squared errors, effectively getting the model out of local minima. The most significant tweaks were the learning rates.

The best neural network had a mean squared error, on the test set, of 182 bikes. On the training set, the mean squared error was 174. These results were significantly worse than simple lasso or ridge regressions.

The neural network was retrained on just the non-zero features provided by the lasso regression, and this did not lead to any further improvement.

Results

After attempting multiple linear regression across all features, we established a baseline model that had, on average, around 213 unexplained trips per day. When observing the on average number of bike trips per day of 538, our loss is adequate for the complex problem that this could be. Most of our loss is atributed to the abnormally high ridership days. We expected that the features we gathered, such as whether or not the day had an event, to account for these days. It seems that our models were meet these expectations.

After detrending data, our model losses significantly improved. This was an aspect of analysis that was worthwile.

Lasso regressions outperformed any other models by not only lowering loss, but defining a sparse model with an optimally-low amount of features. Surprisingly, lasso outperformed neural networks that included interaction between variables and non-linear activations. If there was more analysis to follow, new features could be introduced that would capture the outlier days.

Conclusions

Future Considerations

By using weather data reported hourly instead of daily, we can use the conditions in a more fine-grained way. This could distinguish between cases where it rained in the morning and cleared up in the afternoon, compared to raining all day.

References

Weather Underground

WeatherUnderground - Austin KATT Station

https://www.wunderground.com/?cm_ven=cgi The dataset above originates from an IBM company called Weather Underground. As described above, this specific dataset will include historical weather data for the Austin, Texas area. Coupled with our other crime-based datasets, we may be able to discover insight for the relationship between crime and weather patterns in the Austin area.

MetroBike

MetroBike - Austin B-Cycle Stations

https://austin.bcycle.com/stations The MetroBike dataset includes geolocation data for the metropolitan bike sharing initiative. These data could provide insight to typical traffic patterns and traveling preferences around the city.

Austin Crime

https://data.austintexas.gov/Public-Safety/Crime-Reports/fdj4-gpfu/

City of Austin Crime Reports The City of Austin provides a data portal to obtain records of incidents received by the Austin Police Department. This public safety dataset provides current data that is updated weekly.

Linear Relationships and Weather Patterns

https://nycdatascience.com/blog/r/using/

Using Linear Regression to Predict Weather Patterns The article above employs linear regression to quantify the trends of weather patterns. Their research concludes that linear regression is a nonlinear system. This understanding will have implications for model selection and pre-training feature modifications.

[NbConvertApp] Converting notebook blog.ipynb to html
[NbConvertApp] Writing 955317 bytes to blog.html